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Abstract. Despite computational superiorities, some traditional conjugate gradient 
algorithms such as Polak-Ribiére-Polyak and Hestenes-Stiefel methods generally fail to 
guarantee the descent condition. Here, in a matrix viewpoint, spectral versions of such 
methods are developed which fulfill the descent condition. The convergence of the given 
spectral algorithms is argued briefly. Afterwards, we propose an improved version of the 
nonnegative matrix factorization problem by adding penalty terms to the model, for 
controlling the condition number of one of the factorization elements. Finally, the 
computational merits of the method are examined using a set of CUTEr test problems as 
well as some random nonnegative matrix factorization models. The results typically agree 
with our analytical spectrum. 
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1. Introduction 


Scholar studies show that the introduction of conjugate gradient (CG) methods 
made a revolution in the field of numerical optimization. Requiring low memory 
and having simple iterations besides acceptable convergence, the methods have 
been extensively utilized in practical disciplines such as signal processing, 
machine learning, and neural networks training, which often appear in large-scale 
models. 


For solving the minimization problem min f(x) with a smooth real-valued 
ze 


function f, here we focus on the CG methods which their search directions can be 
formulated by 


do = —Go. ders = —PyeiiGn+1> forall k => 0, (1.1) 
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where P;,,, € IR” is a rank-one update of the identity matrix as follows: 
Pryoa =1— Pek: (1.2) 


where px, Gx € IR" are non-zero vector parameters. Notice that d,4, in (1.1) can 
be effectively determined by performing one inner product and storing three n- 
dimensional vectors. Now, for the iterative method 


Xpe1 = Xx +Sx, forall k = 0, 
with some xg € IR", where s; = @,d, with the step length a, > 0 fixed by a line 
search [30] along the direction d, given by (1.1), the choice 
(1.3) 


dk Yk 
(Py, Ix) = | ——., ee i 
Jaf (ahve 


with Vy = Oxia — Gx and gy = Vf (xx), leads to the CG method proposed by 
Hestenes and Stiefel [17] (HS). Also, by setting 
fe y, 
(Px, Gx) = ( ~ ae ), (1.4) 


Nall” Negzell 


with ||-|| standing for the Euclidean norm, we get the method given by Polak and 
Ribieére [23], and Polyak [24] (PRP). 


From this point forward, we enforce the popular strong Wolfe conditions in the 
line search, i.e. 


f Gy, + ay d,) — f (xx) = Sa;,gi d,, (1.5) 
Vi (xx + a,d,)' d,| = —ogid,, (1.6) 


Where 0 < 6 < g < 1, ensuring pj d,, > 0 for the choices (1.3) and (1.4) which is 
required in our future analysis. 


Despite efficiency in the computational viewpoint among the traditional CG 
techniques [3], PRP and HS iterations may generate uphill directions. To combat 
this defect, scholars made extensive efforts to open up modification ways to get 
the descent versions of the methods. In this direction, a fundamental step has been 
taken by developing three-term CG algorithms principally by Zhang, Zhou and Li 
[37, 38]. In another class of modifications, researchers picked up on the thread 
that introduced by Dai and Liao [10] to gain descent extensions of the methods; 
see for example [5, 7, 16]. Spectral extensions of the PRP and HS methods also 
have attracted significant attention to get the descent property; as examples take a 
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look at the analyses carried out by Andrei [4], Dong et al. [12], Wan, Yang and 
Wang [33], Wan et al. [32], Faramarzi and Amini [14], Jian et al. [18], Zhang and 
Dan [39], Du and Liu [13], Wang et al. [34], Yu, Guan and Chen [36], and Sun et 
al. [31]. 


To make progress in the class of spectral HS/PRP algorithms, here we bring 
about change in the classical analyses of the literature in the sense of utilizing 
matrix aspects, as will be discussed in the next section (Section 2). We present our 
modified model for the nonnegative matrix factorization problem in Section 3. We 
share our numerical experiments on a category of benchmark test problems as 
well as some randomly generated cases of the nonnegative matrix factorization 
problem in Section 4. Ultimately, in Section 5 we present the concluding notes. 


2. On aclass of spectral descent conjugate gradient algorithms 
Spectral CG algorithms have been principally introduced by Birgin and Martinez 


[8], reporting improved effects on the performance of the methods. Motivated by 
the results of [8], here we use a scalar scaling of the identity matrix in (1.2) as 


= (2.1) 
Desi = Mel — PEGE, 


where the positive scalar fly is called the spectral parameter, and then, we define 
the spectral CG directions by 


(2.2) 
do = —Go. Agia = —Oxn419 x41, for all k = 0. 


Now, to avoid generating uphill search directions as an important theoretically 
troubling issue for several traditional CG algorithms with the search direction 


matrix format (1.2), we compute py in (2.1) to gain the descent property. Since 


d3.g9 = —IIgoll? < 0, and, for all k > 0, 


(2.3) 


T 
T _ T = T Qr+itQpes 
Gi Ges = —G9+1 9k Gks1 = —Ik+1 3 Skt, 


one possibility is to make the symmetric matrix 
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O+1+ Of 1 1 
Hy = = yl — SPA — 5 UP 


to be positive definite. So, following the instructions of [6], what it takes is to 


determine the eigenvalues of Flz+4. 


Since there exists a set of m—2 mutually orthogonal vectors being also 
orthogonal to px and qx, the spectral parameter pl; is an eigenvalue of Fy44 with 
the multiplicity m— 2. We pursue our analysis with the aim of finding the two 
other eigenvalues of Hy,ydenoted by Tj and T;. In this regard, because the trace 


of Fly44 and the sum of its eigenvalues are equal, we get 
(2.4) 


T]; +1 = 2M, — Pek. 


On the other hand, since tr(#t a aoe, 441) = ||Hg+1/]? with |l-||- standing for the 


Frobenius norm, being also equal to sum of the squared eigenvalues of Flz,44, we 
can write 


(2.5) 
- 2 1 ae 
Te +t = 2n; —2(Pide) Me t+ (Ped) +5 llPell*llgell?. 


Now, from (2.4) and (2.5), we get 


(2.6) 
= 1 2 1 
TT = HE — (Vide) Me + ri (piqn) — ; ell? axl. 


Thus, from (2.4) and (2.6), we should solve the following quadratic equation: 
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2 q 
Th — (2m — PEOu)Te + HE (PEI) Me +5 (PEOe) — 5 llPell*llaell? = 0. 


to determine Tj, and tj. Indeed, after a series of classic algebraic manipulations, 
we obtain 


1 } 
Te = 3 (2M — PLA + lIPull lull). 


As we know, imposing positiveness on the smallest eigenvalue of Ff;,44 makes 
it positive definite. From this fact and since we can see that T > fy = Tj, next 


we plan to find fly in such a way that T > 0. So, we should have 


1 
Hy > 5 PEM + [pel llgxll). 
Thus, as a result of the above analysis, the following class of two-parameter 


choices for ffx is given: 


(2.7) 
Hy = a(ph ax) + bilpellliqell. 


Where a, b > > guaranteeing the descent condition for the spectral CG methods 
with the directions of the general structure (2.2). 


Here, abbreviations SHS and SPRP are used to represent the scaled CG 


algorithms respectively with the choices (1.3) and (1.4) adopted in (2.2), with py 
defined by (2.7). For SHS, we establish the following result. 
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Lemma 2.1. If in (2.7) we have a> 5+ €, for some constant € > 0, then 
directions of SHS fulfill the sufficient descent condition, i.e. 


(2.8) 
di.9x = —pllgx|l’, for all k = 0, 


for some p > 0. 


Proof. By taking into account the relation (2.3), we get 
Gis 9ev1 = —GievrHesrGierr < —T ll Geral? < —EpE ell Gel. 


Now, from (1.3) we have pi.qy = 1 which leads to dt, 9x41 = —€ll@xsall?. 
Next, we concisely address the convergence of SHS and SPRP using the routine 

measures of the literature. Henceforth, we suppose that the following assumption 

holds. 

Assumption 2.1. For an arbitrary Xg € IR”, @ = {x: f(x) = f(xg)} is a bounded 


set and in some neighborhood S of 2, Vf (x) is Lipschitz continuous; that is, 


(2.9) 
IV F(&) — VF) || = L\lx— XI], for all x, XE S, 


for some positive constant LE. 


Now, if fis strongly convex [30] on the neighborhood S of , then, from (2.1), 
(2.2), and (2.9), along with Theorem 1.3.16 of [30], the sequence {||d,||},~9 of 
SHS is bounded above. Similarly, for SPRP, if (2.8) holds and py is bounded 


above (ensured by setting p< min{p,. M}, where M is an enough large 
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positive constant), from Lemma 3.1 of [37] we get the boundedness of the 


sequence {||d;,||},~9. Eventually, for both of the SHS and SPRP methods with the 
strong Wolfe conditions (1.5) and (1.6), Lemma 3.1 of [29] leads to the 


convergence in the sense of lim inf ||g;|| = 0. 
k-05 


3. An improved model for the nonnegative matrix factorization problem 


Among the key characteristics of the digital era, is the impact of dimensionality 
reduction methodologies on the analysis of large data sets. In recent years, 
nonnegative matrix factorization (NMF) has attracted the attention of many 
researchers as a simple and easily interpretable technique for extracting hidden 
and important features of the data [9, 20, 25, 28, 35]. Classic NMF involves 
(approximately) decomposing an arbitrary matrix A € R'™”*" into two matrices 
W € R™*" and H € R’™*" in the sense of A » WH, under the assumption that the 
elements of A, W and H are nonnegative and r « min {m,n}. Considering a 
nonnegative matrix A (ie. A=O which means that the entries of A are 
nonnegative), modeling the NMF problem can be accomplished in the following 
manner: 


min F(W, H) = = ||A — WHI, s.t. WH > 0. (3.1) 


Since being nonconvex, it is a reasonable idea to transform (3.1) into some 
possible convex subproblems, thereby benefiting from the advantages of the 
convex optimization tools. By maintaining one matrix factor constant, the other 
matrix factor can be calculated by solving a least-squares problem. Based on this 
fact, ANLS (alternating nonnegative least-squares) [22] has been traditionally 
regarded as an important approach, technically characterized by solving the 
following two least-squares models alternatively: 


H®*1_arp min F(W*, H), (3.2) 
W*t1l-arg min F(W, H**?), (3.3) 


for all k = 0, starting by some initial approximations for W and H. 
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As known, well-conditioning is an influential issue in the field of matrix 
computations. So, here we focus on the condition number of the (mostly) positive 
definite matrix } = HH™ of the dimension r X r in the model (3.1) to possibly 
combat the collinearity between the rows of H, and as a result, to achieve the 
computational stability for at least one of the factorization elements. Therefore, 
the following improved version of the NMF model (3.1) is proposed: 


5(W, H) = =||A — WH|[2 + Ex(32), s.t. WH = 0, (3.4) 


where € = O is the penalty parameter and x(.) stands for the spectral condition 
number [26, 27]. 


Since generically computing x(H) in the model (3.4) is costly in the 
computational viewpoint as well as the CPU time, diagonal approximations of H 
are more preferable. So, we consider 


H = D = diag(Dj, D3, --- ,D;), 
where 


2. : 
a t=1, YN ge 


D; = Yj 
Notably, the above estimation is derived by 


D*=arg min || — D|lz, 


where D* € R'™" refers to the collection of all diagonal matrices whose elements 
are nonnegative. In addition, we employ the following function, devised based on 
the relation between the geometric and arithmetic means to assess the condition 
number in (3.4): 


(A) = “9 — ‘/aer(a), 
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for an arbitrary positive definite matrix A € R™*". Consequently, if w(.) (rather 
than x(.)) is applied in (3.4), it is also reasonably probable to approximate H in a 
way that its eigenvalues to be well-distributed. 


As a result of our argument, and especially, in pursuit of the simplicity which is 
vital for large-scale cases, we propose the following modified NMF model: 


5(W, H) = =||A —WH|2 + ép(D),s.t.W,H = 0. (3.5) 


Additionally, the following revised versions of the least-squares models (3.2) and 
(3.3) should alternately be solved: 


H**1 arg min 3(W*, H), 
H20 
W**1=arg min 5(W, H**4), 
we0 


for all k = 0. The corresponding method here is called improved ANLS (IANLS) 
method as well. Note that the cost function (3.5) reduces to the traditional NMF 
cost function (3.1) when € = 0. 


4. Numerical tests 


To show support for our theoretical analysis, here we provide some experimental 
evidence across the computational spectrum. To proceed, we have selected 42 test 
functions of the CUTEr [15] with n => 50, as listed in Table 1. All the tests have 
been performed in MATLAB version 7.14.0.739 (R2012a) installed on a 
computer AMD FX-9800P RADEON R7, with 12 COMPUTE CORES 4C+8G 
2.70 GHz of CPU and 8 GB of RAM, by the Centos 6.2 server Linux operation 
system. Firstly, we have compared the performance of the HS—based techniques 
including the SHS method and the scaled HS method of [19] (PSHS) with @, = 1. 
Then, we have compared the performance of the PRP-based techniques including 
the SPRP method and the scaled PRP method of [33] (SPRP-WYW). We used the 
approximate Wolfe conditions of [16] with similar settings. The algorithms were 
stopped by similar conditions as given in [2] with |]g,|] < 10°C + |f,|). We 
have scrutinized the effect of the parameters a and b in (2.7) on the performance 
of SHS and SPRP by evaluating the outputs for different choices 
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a,b € {0- 1k}}2,. As a result, we recognized the values (a. b) = (0.7, 0.6) for 
SHS and SPRP as the best. 


To judge the performance of the algorithms visually, we applied the Dolan— 
Moré performance profile [11] following the notations of [2], on the factors of the 
total number of function and gradient evaluations (TNFGE) [16], and the CPU 
time (CPUT). Results of the performance comparisons have been depicted in 
Figures 1 and 2. They confirm that our matrix-based approach can be capable of 
delivering progress for the performance of the traditional CG algorithms such as 
HS and PRP. 


(a) TNFGE (b) CPUT 
Figure 1. Performance profile plots for SHS and PSHS 


‘SPRP 
~ = = -PRP-WYW - - - “SPRP-WYW 


1 1.5 2 25 3 3.5 4 4.5 5 1 15 2 25 3 3.5 4 45 5 


(a) TNFGE (b) CPUT 
Figure 2. Performance profile plots for SPRP and SPRP—W YW 
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Table 1. Test functions information 


Function n Function n 
ARGLINA 200 FLETCHCR 1000 
BROYDN7D 5000 FMINSRF2 5625 
DIXMAANA 3000 FMINSURF 5625 
DQRTIC 5000 GENHUMPS 5000 
EG2 1000 MANCINO 100 
ENGVALI 5000 MOREBV 5000 
EXTROSNB 1000 MSQRTBLS 1024 
FLETCBV2 5000 NCB20B 5000 
PENALTY2 200 NONCVXU2 5000 
PENALTY3 200 SINQUAD 5000 
QUARTC 5000 TOINTGOR 50 
SCHMVETT 5000 BQPGABIM 50 
TOINTGSS 5000 BQPGASIM 50 
VARDIM 200 BRATUID 5003 
VAREIGVL 50 DMN15102 66 
ARGLINB 200 DMN37143 99 
CURLY30 10000 DRCAVILQ 4489 
DECONVU 63 DRCAV2LQ 4489 
EIGENALS 2550 DRCAV3LQ 4489 
EIGENBLS 2550 FLETCBV3 5000 
ERRINROS 50 FLETCHBV 5000 


The final part of our experiments is devoted to evaluating 


the performance of 


SHS and SPRP for the ANLS and IANLS techniques. Setting € = 1 in the IALNS 
strategy, we adopted the stopping condition of [21]. Using a uniform distribution, 
test matrices were randomly generated for different dimensions and a similar 
approach was used to estimate the initial NMF elements, based on the suggestions 
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of [1]. Outputs have been reported in Tables 2 and 3, including the condition 
number (Cond) and RelErr, calculated by 


||A-WH le 


RelErr = all 


To summarize the outputs, it is noteworthy that condition numbers of H and 
RelErr show that IANLS performs better than ANLS. Therefore, it can be 
concluded that IANLS is capable of producing well-conditioned NMF element H 
that are accurate and of acceptable quality. 


5. Conclusions 


Probable uphill search directions may push the traditional conjugate gradient 
algorithms to the brink of default. To defeat such a computationally troubling 
issue, we developed simple spectral versions of the traditional conjugate gradient 
algorithms. Our study is principally based on the matrix aspects, analysing the 
eigenvalue features of the search direction matrix. We concisely addressed the 
convergence of the given methods using the common assumptions in the 
literature. 


The focus of the next part of our research is on making possible modifications to 
a classic optimization model of the nonnegative matrix factorization problem. 
According to the available information, it is a significant problem that frequently 
arises across a wide range of practical contexts. Our improved model attempts to 
rule out the possibility of ill-conditioning in the factorization trajectory by using a 
classical measure function. 


To examine the performance of the given methods, some computational tests 
have been carried out on the CUTEr functions. The outputs have been assessed 
based on the well-known Dolan—Moré benchmark. Furthermore, an evaluation of 
the quality of our improved nonnegative matrix factorization model has been 
conducted on several random cases. According to the results, the improved model 
can produce well-conditioned factorization elements with a reasonable relative 
error. Therefore, our theoretical assertions were supported by computational 
experiments. 
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Table 2. The outputs of SHS for NMF 


Dimensions (m, n, r) | METHOD Cond H RelErr 
(50, 50, 4) ALNS 1.99E+00 6.51E-04 
IALNS 1.30E+00 6.14E-04 
(100, 50, 5) ALNS 2.05E+00 6.07E-04 
IALNS 1.20E+00 5.84E-04 
(100, 100, 5) ALNS 2.21E+00 6.45E-04 
IALNS 1.23E+00 6.44E-04 
(100, 250, 5) ALNS 2.97E+00 9.05E-04 
IALNS 1.29E+00 8.04E-04 
(200, 200, 4) ALNS 1.52E+00 7.25E-04 
IALNS 1.23E+00 7.03E-04 
(200, 200, 8) ALNS 3.26E+00 8.43E-04 
IALNS 1.32E+00 8.24E-04 
(200, 300, 6) ALNS 1.91E+00 8.75E-04 
IALNS 1.20E+00 7.50E-04 
Table 3. The outputs of SPRP for NMF 
Dimensions (m, n, r) | METHOD Cond H RelErr 
(50, 50, 4) ALNS 2.21E+00 7.07E-04 
IALNS 1.28E+00 6.99E-04 
(100, 50, 5) ALNS 1.52E+00 6.14E-04 
IALNS 1.26E+00 6.10E-04 
(100, 100, 5) ALNS 2.92E+00 7.71E-04 
IALNS 1.26E+00 7.50E-04 
(100, 250, 5) ALNS 4,24E+00 1.20E-03 
IALNS 1.40E+00 9.81E-04 
(200, 200, 4) ALNS 2.44E+00 7.64E-04 
IALNS 1.24E+00 7.28E-04 
(200, 200, 8) ALNS 2.19E+00 9.28E-04 
IALNS 1.29E+00 9,.25E-04 
(200, 300, 6) ALNS 2.27E+00 1.09E-03 
IALNS 1.34E+00 9.60E-04 
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